133 research outputs found

    LuxRep: a technical replicate-aware method for bisulfite sequencing data analysis

    Get PDF
    Background: DNA methylation is commonly measured using bisulfite sequencing (BS-seq). The quality of a BS-seq library is measured by its bisulfite conversion efficiency. Libraries with low conversion rates are typically excluded from analysis resulting in reduced coverage and increased costs.Results: We have developed a probabilistic method and software, LuxRep, that implements a general linear model and simultaneously accounts for technical replicates (libraries from the same biological sample) from different bisulfite-converted DNA libraries. Using simulations and actual DNA methylation data, we show that including technical replicates with low bisulfite conversion rates generates more accurate estimates of methylation levels and differentially methylated sites. Moreover, using variational inference speeds up computation time necessary for whole genome analysis.Conclusions: In this work we show that taking into account technical replicates (i.e. libraries) of BS-seq data of varying bisulfite conversion rates, with their corresponding experimental parameters, improves methylation level estimation and differential methylation detection.</p

    Dynamic modeling of gene expression in prokaryotes: application to glucose-lactose diauxie in Escherichia coli

    Get PDF
    Coexpression of genes or, more generally, similarity in the expression profiles poses an unsurmountable obstacle to inferring the gene regulatory network (GRN) based solely on data from DNA microarray time series. Clustering of genes with similar expression profiles allows for a course-grained view of the GRN and a probabilistic determination of the connectivity among the clusters. We present a model for the temporal evolution of a gene cluster network which takes into account interactions of gene products with genes and, through a non-constant degradation rate, with other gene products. The number of model parameters is reduced by using polynomial functions to interpolate temporal data points. In this manner, the task of parameter estimation is reduced to a system of linear algebraic equations, thus making the computation time shorter by orders of magnitude. To eliminate irrelevant networks, we test each GRN for stability with respect to parameter variations, and impose restrictions on its behavior near the steady state. We apply our model and methods to DNA microarray time series' data collected on Escherichia coli during glucose-lactose diauxie and infer the most probable cluster network for different phases of the experiment.Comment: 20 pages, 4 figures; Systems and Synthetic Biology 5 (2011

    Associations of age and sex with brain volumes and asymmetry in 2–5-week-old infants

    Get PDF
    Information on normal brain structure and development facilitates the recognition of abnormal developmental trajectories and thus needs to be studied in more detail. We imaged 68 healthy infants aged 2–5 weeks with high-resolution structural MRI (magnetic resonance imaging) and investigated hemispheric asymmetry as well as the associations of various total and lobar brain volumes with infant age and sex. We found similar hemispheric asymmetry in both sexes, seen as larger volumes of the right temporal lobe, and of the left parietal and occipital lobes. The degree of asymmetry did not vary with age. Regardless of controlling for gestational age, gray and white matter had different age-related growth patterns. This is a reflection of gray matter growth being greater in the first years, while white matter growth extends into early adulthood. Sex-dependent differences were seen in gray matter as larger regional absolute volumes in males and as larger regional relative volumes in females. Our results are in line with previous studies and expand our understanding of infant brain development.</p

    Resting-state networks of the neonate brain identified using independent component analysis

    Get PDF
    Resting-state functional magnetic resonance imaging (rs-fMRI) has been successfully used to probe the intrinsic functional organization of the brain and to study brain development. Here, we implemented a combination of individual and group independent component analysis (ICA) of FSL on a 6-min resting-state data set acquired from 21 naturally sleeping term-born (age 26 +/- 6.7 d), healthy neonates to investigate the emerging functional resting-state networks (RSNs). In line with the previous literature, we found evidence of sensorimotor, auditory/language, visual, cerebellar, thalmic, parietal, prefrontal, anterior cingulate as well as dorsal and ventral aspects of the default-mode-network. Additionally, we identified RSNs in frontal, parietal, and temporal regions that have not been previously described in this age group and correspond to the canonical RSNs established in adults. Importantly, we found that careful ICA-based denoising of fMRI data increased the number of networks identified with group-ICA, whereas the degree of spatial smoothing did not change the number of identified networks. Our results show that the infant brain has an established set of RSNs soon after birth

    Genome-wide association study of primary tooth eruption identifies pleiotropic loci associated with height and craniofacial distances

    Get PDF
    Twin and family studies indicate that the timing of primary tooth eruption is highly heritable, with estimates typically exceeding 80%. To identify variants involved in primary tooth eruption we performed a population based genome-wide association study of ‘age at first tooth’ and ‘number of teeth’ using 5998 and 6609 individuals respectively from the Avon Longitudinal Study of Parents and Children (ALSPAC) and 5403 individuals from the 1966 Northern Finland Birth Cohort (NFBC1966). We tested 2,446,724 SNPs imputed in both studies. Analyses were controlled for the effect of gestational age, sex and age of measurement. Results from the two studies were combined using fixed effects inverse variance meta-analysis. We identified a total of fifteen independent loci, with ten loci reaching genome-wide significance (p<5x10−8) for ‘age at first tooth’ and eleven loci for ‘number of teeth’. Together these associations explain 6.06% of the variation in ‘age of first tooth’ and 4.76% of the variation in ‘number of teeth’. The identified loci included eight previously unidentified loci, some containing genes known to play a role in tooth and other developmental pathways, including a SNP in the protein-coding region of BMP4 (rs17563, P= 9.080x10−17). Three of these loci, containing the genes HMGA2, AJUBA and ADK, also showed evidence of association with craniofacial distances, particularly those indexing facial width. Our results suggest that the genome-wide association approach is a powerful strategy for detecting variants involved in tooth eruption, and potentially craniofacial growth and more generally organ development

    A Bayesian Search for Transcriptional Motifs

    Get PDF
    Identifying transcription factor (TF) binding sites (TFBSs) is an important step towards understanding transcriptional regulation. A common approach is to use gaplessly aligned, experimentally supported TFBSs for a particular TF, and algorithmically search for more occurrences of the same TFBSs. The largest publicly available databases of TF binding specificities contain models which are represented as position weight matrices (PWM). There are other methods using more sophisticated representations, but these have more limited databases, or aren't publicly available. Therefore, this paper focuses on methods that search using one PWM per TF. An algorithm, MATCHTM, for identifying TFBSs corresponding to a particular PWM is available, but is not based on a rigorous statistical model of TF binding, making it difficult to interpret or adjust the parameters and output of the algorithm. Furthermore, there is no public description of the algorithm sufficient to exactly reproduce it. Another algorithm, MAST, computes a p-value for the presence of a TFBS using true probabilities of finding each base at each offset from that position. We developed a statistical model, BaSeTraM, for the binding of TFs to TFBSs, taking into account random variation in the base present at each position within a TFBS. Treating the counts in the matrices and the sequences of sites as random variables, we combine this TFBS composition model with a background model to obtain a Bayesian classifier. We implemented our classifier in a package (SBaSeTraM). We tested SBaSeTraM against a MATCHTM implementation by searching all probes used in an experimental Saccharomyces cerevisiae TF binding dataset, and comparing our predictions to the data. We found no statistically significant differences in sensitivity between the algorithms (at fixed selectivity), indicating that SBaSeTraM's performance is at least comparable to the leading currently available algorithm. Our software is freely available at: http://wiki.github.com/A1kmm/sbasetram/building-the-tools

    A Linear Model for Transcription Factor Binding Affinity Prediction in Protein Binding Microarrays

    Get PDF
    Protein binding microarrays (PBM) are a high throughput technology used to characterize protein-DNA binding. The arrays measure a protein's affinity toward thousands of double-stranded DNA sequences at once, producing a comprehensive binding specificity catalog. We present a linear model for predicting the binding affinity of a protein toward DNA sequences based on PBM data. Our model represents the measured intensity of an individual probe as a sum of the binding affinity contributions of the probe's subsequences. These subsequences characterize a DNA binding motif and can be used to predict the intensity of protein binding against arbitrary DNA sequences. Our method was the best performer in the Dialogue for Reverse Engineering Assessments and Methods 5 (DREAM5) transcription factor/DNA motif recognition challenge. For the DREAM5 bonus challenge, we also developed an approach for the identification of transcription factors based on their PBM binding profiles. Our approach for TF identification achieved the best performance in the bonus challenge

    Robust computational reconstitution – a new method for the comparative analysis of gene expression in tissues and isolated cell fractions

    Get PDF
    BACKGROUND: Biological tissues consist of various cell types that differentially contribute to physiological and pathophysiological processes. Determining and analyzing cell type-specific gene expression under diverse conditions is therefore a central aim of biomedical research. The present study compares gene expression profiles in whole tissues and isolated cell fractions purified from these tissues in patients with rheumatoid arthritis and osteoarthritis. RESULTS: The expression profiles of the whole tissues were compared to computationally reconstituted expression profiles that combine the expression profiles of the isolated cell fractions (macrophages, fibroblasts, and non-adherent cells) according to their relative mRNA proportions in the tissue. The mRNA proportions were determined by trimmed robust regression using only the most robustly-expressed genes (1/3 to 1/2 of all measured genes), i.e. those showing the most similar expression in tissue and isolated cell fractions. The relative mRNA proportions were determined using several different chip evaluation methods, among which the MAS 5.0 signal algorithm appeared to be most robust. The computed mRNA proportions agreed well with the cell proportions determined by immunohistochemistry except for a minor number of outliers. Genes that were either regulated (i.e. differentially-expressed in tissue and isolated cell fractions) or robustly-expressed in all patients were identified using different test statistics. CONCLUSION: Robust Computational Reconstitution uses an intermediate number of robustly-expressed genes to estimate the relative mRNA proportions. This avoids both the exclusive dependence on the robust expression of individual, highly cell type-specific marker genes and the bias towards an equal distribution upon inclusion of all genes for computation

    Partial Support for an Interaction Between a Polygenic Risk Score for Major Depressive Disorder and Prenatal Maternal Depressive Symptoms on Infant Right Amygdalar Volumes

    Get PDF
    Psychiatric disease susceptibility partly originates prenatally and is shaped by an interplay of genetic and environmental risk factors. A recent study has provided preliminary evidence that an offspring polygenic risk score for major depressive disorder (PRS-MDD), based on European ancestry, interacts with prenatal maternal depressive symptoms (GxE) on neonatal right amygdalar (US and Asian cohort) and hippocampal volumes (Asian cohort). However, to date, this GxE interplay has only been addressed by one study and is yet unknown for a European ancestry sample. We investigated in 105 Finnish mother-infant dyads (44 female, 11-54 days old) how offspring PRS-MDD interacts with prenatal maternal depressive symptoms (Edinburgh Postnatal Depression Scale, gestational weeks 14, 24, 34) on infant amygdalar and hippocampal volumes. We found a GxE effect on right amygdalar volumes, significant in the main analysis, but nonsignificant after multiple comparison correction and some of the control analyses, whose direction paralleled the US cohort findings. Additional exploratory analyses suggested a sex-specific GxE effect on right hippocampal volumes. Our study is the first to provide support, though statistically weak, for an interplay of offspring PRS-MDD and prenatal maternal depressive symptoms on infant limbic brain volumes in a cohort matched to the PRS-MDD discovery sample
    corecore